10 research outputs found
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Markov decision processes (MDPs) are the defacto frame-work for sequential
decision making in the presence ofstochastic uncertainty. A classical
optimization criterion forMDPs is to maximize the expected discounted-sum
pay-off, which ignores low probability catastrophic events withhighly negative
impact on the system. On the other hand,risk-averse policies require the
probability of undesirableevents to be below a given threshold, but they do not
accountfor optimization of the expected payoff. We consider MDPswith
discounted-sum payoff with failure states which repre-sent catastrophic
outcomes. The objective of risk-constrainedplanning is to maximize the expected
discounted-sum payoffamong risk-averse policies that ensure the probability to
en-counter a failure state is below a desired threshold. Our maincontribution
is an efficient risk-constrained planning algo-rithm that combines UCT-like
search with a predictor learnedthrough interaction with the MDP (in the style
of AlphaZero)and with a risk-constrained action selection via linear
pro-gramming. We demonstrate the effectiveness of our approachwith experiments
on classical MDPs from the literature, in-cluding benchmarks with an order of
10^6 states.Comment: Published on AAAI 202
Verification of Open Interactive Markov Chains
Interactive Markov chains (IMC) are compositional behavioral models extending both labeled transition systems and continuous-time Markov chains. IMC pair modeling convenience - owed to compositionality properties - with effective verification algorithms and tools - owed to Markov properties. Thus far however, IMC verification did not consider compositionality properties, but considered closed systems. This paper discusses the evaluation of IMC in an open and thus compositional interpretation. For this we embed the IMC into a game that is played with the environment. We devise algorithms that enable us to derive bounds on reachability probabilities that are assured to hold in any composition context
Analysis and Prediction of the Long-Run Behavior of Probabilistic Sequential Programs with Recursion (Extended Abstract)
We introduce a family of long-run average properties of Markov chains that are useful for purposes of performance and reliability analysis, and show that these properties can effectively be checked for a subclass of infinite-state Markov chains generated by probabilistic programs with recursive procedures. We also show how to predict these properties by analyzing finite prefixes of runs, and present an efficient prediction algorithm for the mentioned subclass of Markov chains
On the Memory Consumption of Probabilistic Pushdown Automata
We investigate the problem of evaluating memory consumption for systems modelled by probabilistic pushdown automata (pPDA). The space needed by a runof a pPDA is the maximal height reached by the stack during the run. Theproblem is motivated by the investigation of depth-first computations that playan important role for space-efficient schedulings of multithreaded programs.
We study the computation of both the distribution of the memory consumption and its expectation. For the distribution, we show that a naive method incurs anexponential blow-up, and that it can be avoided using linear equation systems.We also suggest a possibly even faster approximation method.Given~, these methods allow to compute bounds on the memoryconsumption that are exceeded with a probability of at most~.
For the expected memory consumption, we show that whether it is infinite can be decided in polynomial time for stateless pPDA (pBPA) and in polynomial space for pPDA. We also provide an iterative method for approximating theexpectation. We show how to compute error bounds of our approximation methodand analyze its convergence speed. We prove that our method convergeslinearly, i.e., the number of accurate bits of the approximation is a linear function of the number of iterations